Interim  Report  A 

High-Speed  Fixed  and  Floating  Point  ^ 
Implementation  of  Delta-Operator  Formulated 

Discrete  Time  Systems 


work  performed  for: 


The  Office  of  Naval  Research 

AD-A283  109 


appiOT«d 

islscse  and  sal«:  its 
diitnbutioa  is  uniunitedi 


January  1, 1994  -  June  30, 1994 
Principal  Investigator: 

Professor  Peter^^aug  Qg  0  9^ 

DepartmerU  of  Electrical  Engineering 
University  of  Notre  Dame 
Notre  Dame,  IN  46556 


pnc  QUAU'i"*  jjiai’iiCTsaj  ^ 


REPORT  DOCUMENTATION  PAGE 


Form  Apprcvea 
0MB  Vo.  0704  0188 


r  •Ttis  ;?niKtion  of  MormAiion  s  ntimatM  '  •'Cur  cf'  - 

I***  neeoea.  if«i  ccmoietm^  tn«  ;c»I«tion ot 

.icicamo  tor  'eaucing  oureef«  .VMmriqtort  ;• 

.i  jra  to  or 

ere  *‘2r  nsirvrocns.  •-I'suno  oata  soufcw. 

>efH3  regaramg  tnn  curflen  estimate  -cr  inv  Ttner  Jtoect  or  tnn 

■i'*. icei.  wirerecrate  TOf  “iror^atfcn  Ooe'aiions  ana  ^eoerts,  i’.S  .erfe'^on 
ler-vcri*  PeouaiO"  Prefect  ’38).  .Vasnioaton.  ZC  *3503. 

08/05/94 


Interim 


aNO  ;lJ3TiTLc 


i  High-Speed  Fixed  and  Floating  Point  Implementation  of 
i  Delta-Operator  Formulated  Discrete  Time  Systems 


Peter  K.  Bauer 


01/01 


S.  FUNDING  NUMBERS 
Grant  if : 

NOOOl 4-94-1-0387 
____  Project  Code: 
i  3148509-01 


;F-.A.'.ilNG  C.RGANIZATIQN  NAME(S)  AND  ADDRESSicS) 

Dept,  of  Electrical  Engineering 
University  of  Notre  Dame 
Notre  Dame,  IN  46556 


S.  .PERFORMING  ORGANIZATION 
REPOST  NUMBER 


iPCriSCSIMG,  MCNIiOaiNG  AGENCY  NAME(S)  AND  AODRESSiiS) 

Office  of  Naval  Research 
Code  251  :  Jwk 
j  Balls ton  Tower  One 
I  800  N.  Quincy  Street 


11.  iUPP'.i.MENTARY  NOTES 

Report  was  prepared  in  cooperation  with  Prof.  K.  Premaratne,  Dept,  of 
Electrical  &  Computer  Engr.,  Univ.  of  Miami,  Coral  Gables,  FL  33124 


ST.SiaUTICN,  AVAIWBIUTY  STATE.MENT 


10.  SPONSORING,  MCNITCRIMG 
AGENCY  REPORT  NU.MBE3 


2t>.  distribution  code 


13.  ABSiRAC.  Kiiaximum ooOworasfT^ls  report  addresses  the  analysis  and  design  of  finite  word- 
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q-operator  counterpart.  In  fact,  delta-operator  systems  always  show  unstable  limit 
cycle  behavior  and  convergence  to  incorrect  equilibrium  points,  independent  of  the 
choice  of  the  realization  or  the  sampling  time.  The  coefficient  sensitivity  for 
delta— systems  is  still  superior  to  the  shift— operator .  In  the  case  of  floating  point 
arithmetic,  delta-operator  Implementations  perform  consistently  better  than  their 

counterparts.  Delta-systems  show  superior  quantization  noise  and  sens¬ 
itivity  properties.  The  zero-convergence  problem  of  the  fixed  point  case  does  not 
if  tbe  mantissa  length  is  chosen  sufficiently  large.  Due  to  its  attractive 
wordlength  properties,  the  concept  of  delta-operators  has  been  extended  to  the 
dimensional  case.  A  2— D  state  space  model  was  developed  and  the  notions  of 
reachability  and  observability  gramlan  and  balanced  realization  have  been  Introduced. 
The  problem  of  directly  checking  stability  in  the  delta-domain  has  also  been  addressed. 
Similarly  to  the  1— D  case,  the  sensitivity  &  roundoff  noise  behavior  was  analyzed. 
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Summary  of  Phase  PI  Results 
Phase  Pi  consists  of  two  tasks: 

[Tl]  Task  Tl:  Analysis  and  design  of  finite  wordlength  implementations  of  linear,  time- 
invariant  ^-Systems. 

[T3]  Task  T3:  2-D  and  m-D  5-system  models. 

The  major  part  of  task  Tl  was  carried  out  at  the  University  of  Notre  Dame  by  Dr. 
Peter  H.  Bauer  while  the  major  part  of  task  T3  was  carried  out  at  the  University  of  Miami 
by  Dr.  Kamal  Premaratne  under  grant  No.  N00014-94- 1-0454.  The  project  being  an 
extensive  collaborative  effort,  the  two  Pi’s  have  been  in  constant  contact. 

The  following  is  a  summary  of  the  phase  PI  results. 

Task  Tl:  Analysis  and  Design  of  Finite  Wordlength  Implementations  of  Linear, 
Time«Invariant  ^-Systems 

The  conclusions  drawn  from  the  work  conducted  for  task  Tl  may  be  summarized  as  follows: 


1.  The  Fixed-Point  Arithmetic  Case:  When  limit  cycle  performance  is  crucial,  the  q- 
operator  implementation  is  preferrable.  The  5-operator  implementation  is  superior 
with  regard  to  coefficient  sensitivity  issues. 

2.  The  Floating-Point  Arithmetic  Case:  Generally,  the  5-operator  implementation  out¬ 
performs  its  5-operator  counterpart.  In  particular,  in  high-order  and  high-speed  ap¬ 
plications,  the  5-operator  implementation  is  the  best  choice. 


Prior  to  a  more  detmled  exposition,  first  we  provide  qualitative  justification  for  the 
above  conclusion.  The  state  equations  of  a  5-operator  system  can  be  written  as: 

5[x](n)  =  Asx{n)  +  R«u(n); 

5lx](n)  =  x(n)  +  A  •  5[x](n). 
where  x  and  u  are  the  state  and  input  vectors,  respectively.  Here,  A  denote  a  positive  real 
constant  (typically,  the  sampling  time).  The  symbol  5[*]  denotes  the  5-operator,  that  is. 


(Tl.l) 


(T1.2) 
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and  9[’]  denotes  the  usual  ^-operator,  that  is, 

9M(«)  =  x(n  +  1).  (T1.3) 

The  corresponding  formulation  of  (T  1.1)  in  terms  of  the  g-operator  is 

9M(n)  =  ^,x(n)  +  B,u(n),  (T1.4) 

where 

+  =  and  Bg  =  A  ■  Bg  Be  =  (T1.5) 

Now,  given  x  and  u,  both  representations  compute  g[x]  with  a  certain  accuracy. 
Consider  the  fi-operator  formulation  in  (Tl.l).  Here  we  encoimter  two  errors: 

1.  The  first  is  due  to  the  computation  of  £[x],  that  is,  the  first  equation  in  (Tl.l).  We 
will  refer  to  this  equation  as  the  intermediaie  equation. 

2.  The  second  is  due  to  the  eventual  computation  of  ^[x],  that  is,  the  second  equation 
in  (Tl.l).  We  will  refer  to  this  equation  as  the  update  equation. 

Let  us  assume  that  the  total  error  in  computing  ^[x]  is  mainly  due  to  the  intermediate 
eqtiation  in  (Tl.l)  (rather  than  the  update  equation).  Then,  by  choosing  A  suflBciently 
small,  the  total  error  in  computing  g[x]  will  be  approximately  the  error  created  by  the 
update  equation  which  is  small!.  In  this  case,  the  ^-operator  representation  has  better 
finite  wordlength  properties  than  its  ^-operator  cotmterpart  in  (T1.4). 

If,  however,  the  errors  accumulated  in  the  intermediate  and  the  update  equations  in 
(Tl.l)  are  comparable,  q[x]  computed  through  the  ^-operator  representation  will  show 
approximately  the  same  error  as  that  computed  through  its  ^-operator  coimterpart  as¬ 
suming  A  is  sufficiently  small.  If  A  is  not  sufficiently  smaller  than  one,  the  ^-operator 
representation  will  actually  perform  worse  than  the  g-operator  representation! 

If  the  error  introduced  in  the  update  equation  is  larger  than  that  in  the  intermediate 
equation,  the  ^-operator  representation  wovild  consistently  perform  worse!!  In  reality,  this 
case  is  very  unlikely  to  occm. 

Next,  a  more  detailed  exposition  follows. 

Tl.l  The  Fixed-Point  Arithmetic  Case 

We  now  discuss  some  of  the  results  regarding  the  fixed-point  (FXP)  case.  Here,  our  results 
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in  Unct  indic&te  that,  in  case  limit  cycle  behavior  is  crucial,  the  ^-operator  representation 
is  NOT  suitable  with  this  arithmetic  scheme  [1].  Such  a  case  may  occur  when  nonlinear 
83rstems  are  implemented  through  FXP  ^-operator  based  schemes. 

Zero-input  limit  cycles.  Independent  of  A,  zero-input  limit  cycles  cannot  be  avoided 
in  FXP  ^-implementations.  This  is  easily  explained  as  follows:  If  A  is  chosen  very  small, 
the  contribution  firom  the  intermediate  equation  being  small  (since  ^[x]  is  being  multiplied 
by  A),  during  the  update  equation,  q[x]  can  be  quantized  to  x  creating  a  DC  limit  cycle, 
that  is,  an  incorrect  equilibrium  point  different  from  zero  results.  We  emphasize  that,  most 
of  the  desirable  properties  of  ^-operator  implementations  are  based  on  a  small  A.  We  may 
also  show  that,  if  A  is  chosen  larger  (this  case  is  of  course  somewhat  less  important),  DC 
limit  cycles  will  still  exist.  Hence,  ^-operator  representations  caxmot  be  implemented  limit 
cycle  free  in  FXP  format!  This  fact  is  independent  of  the  particular  realization  of  the 
system. 

Deadband  size.  Since  6-systems  cannot  be  implemented  limit  cycle  free  in  FXP  format, 
it  is  of  interest  to  investigate  te  the  size  of  such  limit  cycles  since,  in  certain  situations, 
such  small  limit  cycle  amplitudes  can  be  tolerated.  It  can  be  shown  that,  the  magnitude  of 
A  determines  the  magnitude  of  the  limit  cycle.  The  smaller  the  A,  the  larger  will  be  the 
deadband  and  hence  the  limit  cycle  magnitude.  An  approximate  relationship  regarding 
this  is 


A  X  size  of  deadband  =  1,  (T1.6) 

where  the  size  of  deadband  is  measured  in  multiples  of  the  qusmtization  step  size.  Here, 
the  deadband  corresponds  to  that  obtained  by  considering  the  quantization  of  A  •  6[x]. 
Therefore,  the  usual  choice  of  a  small  A  creates  a  larger  deadband! 

The  input  driven  case.  Although  the  input  driven  case  is  not  part  of  the  originally 
proposed  work,  some  interesting  results  have  been  obtained.  For  small  values  of  A,  there 
exists  a  boimded  input  signal  that  does  not  allow  control  of  the  state  trajectory.  In  other 
words,  given  sufficiently  small  A,  the  state  trajectory  may  not  be  influenced  by  such  an 
input  signal. 

The  influence  of  the  realization.  First,  it  was  necessary  to  develop  a  suitable  scheme 
to  investigate  the  effect  of  realization  on  the  presence  or  absence  of  limit  cycles.  In  this  di¬ 
rection,  for  the  ^-operator  case,  a  computer-based  exhaustive  search  algorithm  that  checks 
for  limit  cycles  (DC  and/or  oscillatory)  has  been  developed  [5]. 
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As  discussed  before,  we  have  shown  that,  a  stable  linear  time-invariant  ^-system  cannot 
be  impl^nented  limit  cycle  free  in  FXP.  The  size  of  the  deadband  however  also  depends  on 
the  particular  realization,  that  is,  the  structure  of  As.  Given  a  system  transfer  function, 
there  are  forms  which  minimize  this  deadband  size  with  respect  to  some  appropriately 
chosen  measure.  For  example,  in  order  to  minimize  DC  limit  cycle  amplitude,  one  may  ‘ 
choose  the  normal  form  (in  terms  of  >1 « )  as  a  suitable  candidate. 

Tht  influence  of  quantization  nonlinearity  and  its  deadzone.  Since  a  larger  deadzone 
implies  larger  DC  limit  cycle  amplitudes,  the  use  of  quantizers  with  reduced,  or  even 
zero,  deadzone  was  therefore  proposed.  In  investigating  first-order  systems,  by  reducing 
the  deadzone,  it  was  found  that,  existence  of  DC  limit  cycles  can  indeed  be  reduced. 
Unfortimately,  other  oscillatory  limit  cycles  will  be  created.  This  phenomenon  is  due  to 
the  increased  gain  exhibited  towards  small  input  signals  by  the  quantizer. 

Scaling.  As  discussed  above,  we  have  shown  that,  independent  of  either  the  form  of 
As  or  the  magnitude  of  A,  a  FXP  implemented  ^-system  cannot  be  free  of  zero-input  limit 
cycles.  Hence,  scaling  cannot  be  offered  as  a  possible  solution. 

Tl.i  The  Floating-Point  Arithmetic  Case 

The  floating-point  (FLP)  implementation  of  5-systems  is  currently  xmder  investigation. 
The  results  obtained  so  far  are  very  encouraging,  and  indicate  that,  quantization  errors 
due  to  FLP  arithmetic  have  a  much  smaller  effect  on  the  system  behavior  than  in  the  FXP 
case.  In  frM;t,  preliminary  results  show  that,  for  5-systems  of  order  three  tuid  higher,  errors 
in  computing  g[x]  can  be  made  significantly  smaller  than  for  the  corresponding  ^-systems. 
This  is  because,  for  a  FLP  implementation  of  such  a  system,  errors  created  through  the 
intermediate  equation  are  larger  than  those  created  through  the  update  equation.  As 
previously  mentioned,  in  this  situation,  5-systems  behave  better  than  their  ^-operator 
counterparts! 

Limit  cycles.  In  FLP  arithmetic,  a  linearly  stable  time  invariant  system,  under  zero- 
input  conditions,  may  exhibit  four  types  of  responses:  A  diverging  response,  an  oscillatory 
periodic  response  of  arbitrary  magnitude,  an  c^cillatory  periodic  response  in  vmderflow, 
or  an  asymptotically  stable  response.  Only  the  last  two  response  types  are  acceptable  in 
practice.  It  is  well  known  that,  the  last  response  type  is  in  fact  a  very  stringent  requirement 
and  is  often  not  required  in  practice.  Results  so  far  obtained  show  that,  when  the  require¬ 
ments  for  a  response  in  underflow  are  compared,  the  5-system  requires  less  wordlength 
than  its  Q-system  counterpart!  This  advantage  in  fact  grows  with  the  order  of  the  system!! 
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Once  the  system  reaches  tmderflow  conditions,  the  ^-system  again  exhibits  DC  limit 
cycles.  However,  if  the  exponent  register  is  chosen  sufficiently  large,  the  eunplitude  of  these 
oscillations  c»t.n  be  made  extremely  small  and  hence,  for  all  practical  purposes,  this  problem 
is  solved. 

Deadband  size.  If  the  condition  on  the  mantissa  length  that  guarantees  convergence 
into  imderflow  is  satisfied,  then  the  deadband  size  will  be  very  small.  Hence,  it  can  be 
neglected  for  all  practical  purposes.  This  assumes  a  properly  chosen  exponent  register 
length  since  the  exponent  register  length  determines  the  dynamic  range  of  underflow. 

The  Influence  of  the  Nonlinearity.  Unlike  the  FXP  case,  the  chMacteristic  of  the 
nonlinearity  has  only  a  minor  effect  on  the  system  behavior,  significant  differences  being 
present  only  in  underflow  conditions 

The  Underflow  case.  In  imderflow,  the  6-system  seems  to  behave  worse  than  its  q- 
operator  counterpart.  This  is  mainly  due  to  the  fact  that,  a  FLP  system  in  underflow 
essentially  performs  very  similar  to  a  FXP  system.  However,  as  mentioned  above,  if  the 
dynamic  range  of  underflow  is  chosen  properly,  the  system  behavior  in  underflow  is  of  little 
practical  interest. 

Block  Floating-Point  Arithmetic.  Even  for  the  g-operator  case,  results  regarding  block 
FLP  implementations  are  lacking.  Hence,  investigations  regarding  block  FLP  implemen¬ 
tation  of  6-systems  is  in  its  early  stages.  In  order  to  obtain  a  comparison  between  the  two 
types  of  implementations,  current  research  is  geared  towards  obtaining  results  applicable 
for  the  g-operator  case. 

Tl.S  The  Multi-Dimensional  Case 

The  results  on  one-dimensional  (1-D)  6-operator  implementations  in  FXP  arithmetic  di¬ 
rectly  carry  over  to  the  multi-dimensional  (m-D)  case.  The  existence  of  non-converging 
responses  along  the  boimdary  of  the  causality  region  can  easily  be  proven  using  the  same 
type  of  argument  used  in  the  1-D  case.  Consequently,  6-operator  based  implementations 
oi  m-D  systems  caimot  be  implemented  limit  cycle  free  in  FXP. 

XMkJgg.t  2-D  and  m-D  6-system  models 

Discrete-time  systems  implemented  using  the  6-operator,  as  is  clear  from  the  discussion 
above,  exhibit  superior  finite  wordlength  properties  with  FLP  eirithmetic.  In  the  case  of 
FXP  arithmetic,  they  still  provide  superior  coefficient  sensitivity.  The  development  of  2-D 
and  m-D  models  applicable  for  6-operator  implementations  was  hence  motivated  with  the 
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expectation  that  these  properties  would  still  hold  true. 

The  conclusions  drawn  from  the  work  conducted  for  task  T3  may  be  summarized  as 
follows:  Similar  to  the  1-D  case,  under  FLP  arithmetic,  the  6-operator  implementation  of 
2-D  and  m-D  discrete-time  systems  provides  the  best  choice.  Ageiin,  this  is  particularly 
true  in  high-order  and  high-speed  applications. 


State-space  models.  In  Roesser  local  s.s.  model  of  ^-operator  formulated  2-D  discrete- 
time  systems  takes  the  form 

r  r  4^1)  4(2)-|  r.-A/';  .•\i  f 


f9A[x*](i,j)i  _  rxA(i,j)]  .. 


(T3.1) 


LX  (T3.1) 

where  is  of  size  nn  'xnh,  A^q^  is  of  size  n®  x  Uv,  etc.  Also,  and  g»["]  denote  the 


horizontal  and  vertical  shift  operators,  that  is. 


9A[x](i,i)  =  x(t  +  l,i)  and  gw[x](i,»  =  x(i,i -t- 1). 


(T3.2) 


To  exploit  the  advantages  of  6-operator  implementations,  analogous  to  the  1-D  case, 
we  define  the  operators 

^  x(t-hl,j)-x(t,i)  gfc[x](t,;)-x(*,y) 

- - - 5; - . 

^  x(i,i-|-l)-x(t,j)  _  9vlx](*,i)-x(z,j) 

Sv[xKi,j)  = - - , 

where  Aa  and  Av  are  two  positive  real  constants.  The  corresponding  6-operator  s.s.  model 
may  then  be  obtained  as 


(T3.3) 


‘^A[x*](t,i)l  _ 

rA(i) 

A(2)l 

[5(1)  ■ 

/»[x1(*,i)J  “ 

=  [^(3) 

^(4> 

^  [B(2) 

y(i.i)  =  (C('>  C'«|  [*!|- j]]  + 


(T3.4) 
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This  is  the  2-D  version  of  the  intermediate  equation  mentioned  earlier.  In  addition,  as  for 
the  1-D  case,  we  have  the  following  update  equations: 

qk[x^]{i,j)  =  x'^Ct,  j)  +  Ah  •  ^h[x'^](t,  j); 

9t,[x”](t,  j)  =  +  Av  •  Sv[x'’](iJ). 


Note  that, 


Aq  —  /  -f-  A  •  Afi  — ir  As  —  A  •  {Aq  In)] 

Bg  =  A  -  B  <=^  Bs  =  A~'  •  Bqy 

Cg  =  Cs  Cs^  Cq-, 

Dq  =  Ds  •<=»>■  Ds  =  Dq. 

Here,  A  =  [AfcJn*  ©  A»In„]  is  of  size  (n*  +  n„)  x  (rih  +  n„). 


(T3.5) 


(T3.6) 


The  associated  system  theoretic  notions,  such  aus,  transition  matrix,  transfer  function, 
characteristic  equation,  etc.,  have  also  been  introduced.  This  s.s.  model  is  the  basis  for 
designing  2-D  filters  with  superior  finite  wordlength  properties.  The  design  procedures 
developed  are  expected  to  be  extremely  useful  in  obtaining  high-Q  2-D  and  m-D  digital 
filters  that  axe  suitable  for  high-speed  applications. 


Stability.  In  the  1-D  case,  it  has  been  shown  that,  direct  techniques  with  no  recourse 
to  transformations  (that  first  converts  a  given  6-system  to  its  g-system  counterpart)  can 
provide  numerically  more  reliable  stability  checking  algorithms.  With  this  in  mind,  for  the 
2-D  case,  a  direct  stability  checking  technique  applicable  to  the  corresponding  6-system 
transfer  function  has  been  introduced.  For  this  pxirpose,  a  recently  developed  tabular  form 
was  extended  to  the  complex  coeflicient  czise  and  the  notion  of  Schur-Cohn  minors  was 
introduced  to  the  6-operator  case. 


Gramians  and  balanced  realization.  The  notions  of  reachability  and  observability 
gramians  and  balanced  realization  have  been  introduced  for  the  6-operator  ceise.  In  order 
to  do  this,  first,  the  relationship  between  the  gramians  for  the  6-  8md  g-operator  cases,  as 
defined  in  the  literature,  was  established.  The  reachability  and  controllability  gramians, 
that  is,  P  and  Q,  respectively,  for  1-D  6-systems  were  foimd  to  satisfy 


1  - 

f  df* 

(T3.7) 

!  (c>/  -  A‘t)-'c;cs{ci  -  A,)-' 

Ts  1  -f  Ac 

where  Ts  is  the  stability  boimdary  applicable  for  6-systems,  that  is,  Tj  =  {c  €  :  |c  -t- 

1/A|  =  1/A}.  An  extension  of  this  is  then  used  to  define  the  2-D  gramians  of  6-systems 
represented  in  the  Roesser  model  developed  above. 
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For  the  important  class  of  separable  (that  is,  separable-in-denominator)  syL^^ems,  it 
is  shown  that  these  gramians  may  be  computed  through  the  solution  of  four  Lyapunov 
equations.  These  notions  and  results  are  useful  in  many  applications,  such  as,  in  extracting 
reduced  order  models  of  6-systems. 

SenaHimiy.  Measures  that  indicate  coefficient  sensitivity  of  the  6-models  developed 
above  have  been  introduced.  Unlike  what  is  available  in  literature,  this  development  is 
applicable  to  the  MIMO  case  as  well.  With  these  sensitivity  meastires  ais  a  guide,  devel¬ 
opment  of  minimum  sensitivity  structures  has  been  carried  out.  The  connection  with  the 
corresponding  balanced  realizations  has  been  pointed  out. 

Roundoff  noise.  With  the  use  of  a  noise  model  that  takes  into  accotmt  the  roundoff 
error  propagation  in  the  s.s.  model  developed  above,  structures  that  minimize  roundoff 
noise  have  been  developed. 
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ABSTRACT 

This  paper  analyses  the  problem  of  global  asymptotic 
stability  of  delta-operator  formulated  discrete-time  sys¬ 
tems  impfemented  in  fixed-point  arithmetic.  It  is  shown 
that  the  free  response  of  such  a  system  tends  to  pro- 
diuce  period  one  limit  cycles  if  conventional  quantization 
arithmtic  schemes  are  used.  Explicit  necessary  con¬ 
ditions  for  global  asymptotic  stability  are  derived,  and 
these,  demonstrate  that,  in  almost  all  cases,  fixed-point 
arithmetic  does  not  allow  for  global  asymptotic  stability 
in  delta-operator  formulated  discrete-time  systems  that 
use  a  short  sampling  time. 


I.  INTRODUCTION 

Recently,  discrete-time  systems  formulated  with  the  in¬ 
cremental  difference  operator  (or,  ^-operator)  have  been 
receiving  considerable  attention  in  the  technical  litera¬ 
ture  [1-4].  Most  of  this  work  focus  on  its  superior  per¬ 
formance  under  finite  wordlength  conditions  when  com¬ 
pared  with  those  formulated  with  the  shift-operator  (or, 
q-operator).  In  particular,  investigations  of  coefficient 
sensitivity  and  quantization  noise  properties  have  re¬ 
vealed  that  5-operator  formulations  usually  perform  sig¬ 
nificantly  betto'  than  their  q-operator  counterparts  [l- 
4].  This  is  especially  true  for  high-speed  applications 
where  the  sampling  rate  is  much  larger  than  the  un¬ 
derlying  system  bandwidth.  Under  these  conditions,  q- 
operator  formulated  discrete-time  systems  tend  to  be¬ 
come  ill-conditioned  [1-2]. 

Although  a  large  amount  of  work  is  available  on  the 
effects  of  coefficientsensitivity  and  quantization  noise,  a 
deterministic  study  of  the  nonlinear  behavior  of  discrete¬ 
time  i^stems  formulated  with  the  5-operator  has  not 
been  undertaken.  In  the  case  of  floating-point  (FLP) 
arithmetic,  some  results  for  feedback  system  are  avail¬ 


able  in  [2]. 

In  this  work,  we  focus  on  the  convergence  behavior  of  the 
unforced  system  response  and  global  asymptotic  stabil¬ 
ity  of  5-operator  formulated  discrete-time  systems  imple¬ 
mented  in  fixed-point  (FXP)  arithmetic.  In  particular, 
via  necessary  conditions  for  stability,  it  will  be  shown 
that  such  systems  tend  to  produce  DC  limit  cycles. 

The  structure  of  this  article  is  as  follows;  In  Section  II, 
we  introduce  notation  and  nomenclature.  The  model  for 
5-operator  formulated  discrete-time  systems,  with  and 
without  quantization  nonlinearities,  is  briefly  discussed. 
Section  III  addresses  the  problem  of  asymptotic  stability 
when  FXP  arithmetic  is  used  for  the  implementation. 
In  terms  of  ensuing  DC  limit  cycles,  necessary  condi¬ 
tions  for  global  asymptotic  stability  are  formulated.  It 
is  shown  that,  when  FXP  arithmetic  is  used,  stability 
of  the  linear  system  is  often  lost.  Section  IV  provides 
concluding  remarks. 


II.  NOTATION  AND  NOMENCLATURE 

Since  our  focus  is  on  investigation  of  stability  proper¬ 
ties  of  5-operator  formulated  discrete- time  systems  un¬ 
der  unforced  conditions,  the  state  equations  of  the  sys¬ 
tem  under  zero-input  will  be  considered. 

In  the  linear  case,  the  general  m-th  order  state-sp>ace 
representation  is  given  by 


5[xl(n)  =  A*x(n);  (1) 

x(n -J- 1)  =  x(n) -h  A  •  5(x](n),  (2) 

where  x(n)  =  [n(n), . . .  ,x„,(n)]^  is  the  state  vector  at 
instant  n.  A*  =  {af^  }  6  jjimxm  jg  system  pia^rix. 
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and  A  >  0  is  the  sampling  time.  Moreover,  6[  ]  repre¬ 
sents  the  6-operator,  that  is, 

6[»>l(n)  =  Jt.V— V«/=  (3) 

and  6[x](n)  =  [6[xi](n), . . .  ,6(zm](«)]’'-  The  actual  im¬ 
plementation  of  (1)  and  (2)  in  FXP  format  gives  rise  to 
nonlinear  quantization  operations  that  occur  at  various 
locations  depending  on  the  hardware  realization. 

Bqn.  (1)  can  be  implemented  either  by  using  single 
wordlength  accumulators  (creating  a  quantization  error 
after  eadi  multiplication)  or  by  using  double  wordlength 
accumulators  (creating  a  quantization  error  only  after 
summation).  We  will  only  consider  the  latter  option 
since  practically  all  modem  DSP  machines  implement 
this.  ^n.  (1)  can  then  be  written  as 

6[x](n)  =  Q{A^x(n)},  (4) 

where  Q  is  a  vector-valued  quantization  nonlinearity  of 
the  form 

/QM\ 

Q{x}  ;  (5) 

Here,  Q{zi/}  denotes  magnitude  truncation,  two’s  com¬ 
plement  truncation,  or  rounding. 

Eqn.  (2)  can  be  implemented  in  two  different  ways; 

x(n-H)  =x(n)-|-Q{A -^txKn)},  (6) 

or 

x  =  Q{x(n)-f  A -filxKn)}.  (7) 

Eqn.  (6)  corresponds  to  quantization  after  multiplication 
while  (7)  corresponds  to  quantization  after  summation. 
In  contrast  to  (1),  for  (2),  it  is  not  clear  which  of  the 
two  quantization  schemes  in  (6)  and  (7)  is  preferable. 
We  will  therefore  consider  both  possibilities. 

Throughout  this  paper,  we  will  use  the  following  defini¬ 
tion  of  stability: 

Definition.  The  discrete-time  system  in  {(4), (6)} 
or  {(4),  (7)}  is  globally  asymptotically  stable  if  and 
only  if,  for  any  initial  condition  x(0),  the  state  vec¬ 
tor  X  asymptotically  reaches  zero,  that  is,  x(n)  — »  0 
for  n  — ►  oo. 

Comment.  Since  the  FXP  systems  considered  are  in  fact 
finite  state  machines,  the  condition  x(n)  — »  0  for  n  — •  oo 
may  be  restated  as  x(fV)  =  0  for  some  finite  N  [5]. 

Finally,  the  eymbol  t  is  used  to  denote  the  quantization 
step. 


III.  NECESSARY  CONDITIONS 
FOR  STABILITY 

First,  we  will  consider  the  system  described  by  {(4),  (6)}. 
From  the  definition  for  global  asymptotic  stability  as 
stated  in  the  previous  section,  it  is  necessary  that 

Q{A  ■  6[x](n)}  ^  0,  for  any  x(n)  0.  (8) 

This  is  just  one  of  a  finite  set  of  conditions  that  is  re¬ 
quired  to  ensure  global  asymptotic  stability  of  a  FXP 
implementation  of  a  linearly  stable  system  [5]. 

In  the  case  of  rounding,  condition  (8)  is  violated  if 

lA  •  6[*„](n)|  <  ^,  for  any  «/=l,...,m.  (9) 

The  sampling  time  A  in  a  6-operator  formulated  imple¬ 
mentation  is  typically  very  small.  With  A  =  /-f  and  (9), 
we  have 

|6[a:„](n)l  <  ^,  for  any  j/  =  l,...,m,  (10) 

where  /  is  a  positive  integer. 

In  the  case  of  magnitude  truncation,  (10)  takes  the  form 

|6(zv](n)|  <  j,  for  any  j/=l,...m.  (11) 

Accordingly,  for  two’s  complement  truncation,  we  have 

0  <  6(ar„](n)  <  j,  for  any  y=\,...,m.  (12) 

Conditions  (10-12)  describe  the  deadband,  in  terms 
of  6[x],  for  which  a  DC  limit  cycle  occurs.  Such  a  limit 
cycle  can  be  avoided  if  (10-12)  are  satisfied  by  the  zero 
vector  only.  In  the  case  of  rounding,  we  therefore  require 


or,  equivalently, 

A  >  i,  (13) 

which  is  impractical.  Similarly,  for  magnitude  and  two’s 
complement  truncation,  we  obtain 

t>  j  <=>  A  >  1,  (14) 

which  again  is  equally  impractical. 

This  result  is  summarized  in  the  following  theorem. 
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Theorem  1.  A  necessary  condition  for  stability  of  the  In  the  case  of  the  remaining  two  quantization  schemes, 
{-operator  formulated  discrete-time  system  in  {(4),  (6)}  the  inequalities  corresponding  to  (16)  are  given  as  fol¬ 
ia  A  >  0.5  for  rounding  and  A  >  1  for  truncation.  tows:  For  two’s  complement  truncation, 

The  above  theorem  sVows  that  high-speed  6-operator 
formulated  implementations  that  possess  a  small  sam¬ 
pling  time  cannot  be  realized  limit  cycle  free  in  FXP 
format! 

A  second  necessary  condition  for  the  system  in  {(4),  (6)} 
can  be  obtained  by  noting  that 

=  0  (15) 

can  occur  in  (4)  even  though  the  state  vector  x(n)  ^  0. 


A  similar  analysis  can  be  conducted  for  the  system 
in  {(4), (7)}.  Since  (4)  is  common  to  both  realizations, 
(16-18)  are  still  valid  and  provide  conditions  under  which 
the  finite  difference  is  quantized  to  zero  and  a  DC  limit 
cycle  is  produced.  We  will  now  briefly  discuss  neces¬ 
sary  conditions  for  global  asymptotic  stability  obtained 
from  (7). 


may  be  allowed  to  exist.  Here,  the  inequality  has  to 
hold  elementwise.  Taking  norms  on  both  sides  of  (161 
one  gets  an  algebraic  condition  on  the  system  matrix  A‘ 
that  always  support  DC  limit  cycles,  l^n.  (16)  has  the 
following  interesting  interpretations: 

1.  Each  of  the  resulting  m  inequalities  can  be  geomet¬ 
rically  interpreted  as  the  intersection  of  two  half 
spaces  in  R"*.  These  intersections  are  symmetric 
about  the  origin  and  have  parallel  boundaries.  The 
normal  vector  to  the  boundaries  is  given  by  the 
particular  row  vector  of  A*.  Only  if  the  intersec¬ 
tion  of  all  such  m  half  spaces  contains  a  nonzero 
point  in  R”*,  and  if  it  belongs  to  the  quantization 
lattice,  will  there  exist  a  nonzero  state  vector  that 
is  an  equilibrium  point  of  the  system. 

2.  Eqn.  (16)  can  also  be  interpreted  from  an  eigen¬ 
value/eigenvector  viewpoint.  In  high-spteed  digi¬ 
tal  filters  where  the  sampling  frequency  is  typically 
much  higher  than  the  bandwidth  of  the  processed 
signal,  a  q-operator  implementation’s  eigenvalues 
cluster  around  the  point  «  =  1  [1].  The  correspond¬ 
ing  6-operator  implementation  for  large  sampling 
times  has  eigenvalues  clustered  around  zero.  How¬ 
ever,  as  the  sampling  time  becomes  small,  these 
eigenvalues  move  towards  the  eigenvalues  of  the 
underlying  continuous-time  system  [1].  In  other 
words,  for  large  sampling  times,  the  system  matrix 
will  be  ill-conditioned,  that  is,  vectors  x(ri)  ^  0 
exist  such  that  A*  ■x(n)  is  close  to  the  zero  vector. 
According  to  (16),  this  is  likely  to  cause  a  DC  limit 
cycle.  For  small  sampling  times,  this  problem  may 
not  occur;  however,  in  this  case,  the  conditions  in 
Theorem  1  are  not  satisfied! 


For  rounding,  proceeding  as  in  (9),  we  have 
•  6(*^](n)|  <  ^,  for  any  = 

it 

and  therefore 

16[x„](n)|  <  ;^,  for  any  i/=l....,m.  (19) 


For  magnitude  truncation,  we  obtain 

0  <  6[x^](n)  <  V6[z^]  >  0,  (20) 

and 

-y  <  «l*v](n)  <  0,  V6[*„]  <  0.  (21) 

In  the  case  of  two’s  complement  truncation,  the  condi¬ 
tion  for  a  DC  limit  cycle  is  given  by 

0  <  6[x„](n)  <  j,  Vi/  =  1, . . . ,  m.  (22) 

With  A  =  /  -  f,  /  being  a  ‘small’  integer,  we  come  to  the 
same  conclusion  as  for  the  previously  considered  system; 

A  >  i  for  rounding; 

A  >  1  for  truncation. 

Therefore,  Theorem  1  abo  holds  for  the  system  repre¬ 
sentation  in  {(4), (7)}. 
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IV.  CONCLUSION 

Via  a  aat  of  noccasary  conditions  for  global  asymptotic 
•UbiUty,  it  has  b«en  shown  that  high-speed,  limit  cycle 
froo  ^M^perator  vapkmentations  of  linear  discrete- time 
qratsms  cannot  be  realized.  This  is  due  to  the  tendency 
of  such  a  realisation  to  produce  period  one  limit  cycles. 
This  situation  arises  from  small  values  in  the  finite  dif¬ 
ference  being  quantised  to  zero.  Hence,  convergence  to 
the  ‘wrong’  equilibrium  point  is  very  likely.  Conditions 
on  theaystem  matrix  and  the  sampling  time  if  such  limit 
cycle  bdiavior  is  to  be  avoided  have  been  provided.  The 
results  indicate  that,  in  high-speed  applications,  these 
oondStions  cannot  be  satisfied  with  conventional  quanti¬ 
sation  schemes. 
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Abstract —  In  this  paper,  the  convergence 
properties  of  linearly  stable  multi-dimensional 
sjwtems  are  investigated  for  the  case  of  delta- 
operator  implementations  in  fixed-point  format. 
It  is  shown  that  zero-convergence  is  almost  never 
achieved,  if  the  sampling  time  is  small.  Using  a 
one-dimensional  analysis,  it  is  demonstrated  that 
aero-convergence  cannot  he  guaranteed  along  the 
axis  of  the  first  hyper-quadrant  for  a  first  hyper¬ 
quadrant  causal  system.  This  limits  the  use 
of  delti^operators  for  solving  partial  differential 
equations  in  discrete  time  with  fixed-point  arith¬ 
metic. 


I.  INTRODUCTION 

Delta-operator  (or,  j-operator)  implementations  of 
discrete-time  systems  have  been  the  topic  of  a  number 
of  research  p^>ers  within  the  last  dec^e.  A  compre¬ 
hensive  treatment  of  the  properties  of  5-operator  imple¬ 
mentations  can  be  found  in  [1].  It  is  well  known  that 
j-operators  outperform  shift-operators  (or,  f-operators) 
in  terms  of  their  finite  wordlength  properties  [2].  In  par¬ 
ticular,  its  quantization  noise  and  sensitivity  properties 
make  the  5-operator  an  interesting  alternative  to  the  q- 
operator  in  areas  such  as  digital  control,  digital  signal 
processing,  and  generally  discrete-time  simulation  of  dy¬ 
namical  systems  described  by  differential  equations  [1], 

[31- 

In  this  paper,  we  will  perform  a  deterministic 
aiuJysis  of  the  finite  wordlength  properties  of  multi¬ 
dimensional  (m-D)  5-operator  implemented  discrete- 
time  systems.  In  particular,  we  will  investigate  the  zero- 
convergence  of  5-operator  fixed-point  implementations  of 
one-dimensional  (1-D)  and  m-D  systems.  Although  it  is 
of  vital  importance,  this  problem  has  not  been  investi¬ 
gated  thus  far  in  the  literature.  After  all,  asymptotic 
stalxlity  and  convergence  to  the  true  equilibrium  points 
are  some  of  the  most  fundamental  requirements  for  any 
discrete-time  system  realization. 

This  article  is  organized  in  the  following  way:  Sec¬ 
tion  II  introduces  the  notation.  The  m-D  5-operator 
model  will  be  introduced  and  briefly  discussed.  This 
section  will  also  provide  the  problem  formulation.  Sec¬ 
tion  III  provides  necessary  1-D  stability  conditions  for 
m-D  first  hyper-quadrant  causal  systems  with  nonlin¬ 


earities.  Using  these  necessary  conditions,  section  IV 
provides  a  stability  and  convergence  analysis  for  m-D 
systems.  It  will  be  shown  that  the  resulting  1-D  systems 
cannot  ensure  zero-convergence.  Section  V  contains  con¬ 
cluding  remarks. 


11.  NOTATION  AND  PROBLEM  FORMULATION 

The  m-D  Roesser  model  has  the  following  5- 
operator  formulation  [4]: 


5f*)(xl‘)](n) 

r 

^11 

1 

■x(»)(n)  j 

.6("‘>[x("*)](n). 

••  Atm. 

1 

I 

Bf  1 


«(n): 


(1) 


9<'>lx<')l(n) 

,(m)[x(’")](n)J 


LbU 

x(>)(n) 


x(”*>(n)J 


+  A 


50)(x(>)](n)  1 


(2) 


The  input-state  equations  in  (1)  and  (2)  describe  a  first 
hyper-quadrant  causal  m-D  system  with  a  uniform  sam¬ 
pling  period  of  ^  in  all  directions.  The  operators  q(’^ 
and  51')  represent  the  shift-  and  delta-operator  in  the 
direction  specified  by  the  axis  n* .  In  particular 

,('>lxt')l(n) 

—  X  '  (ni,...,n,'_i,ni.+‘  l,ni.^j,...,nn,)  (3a) 
5<‘)(x<‘)j(n) 
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s=  i(x<')(nt , . . . ,  n<_i ,  n,  +  1,  n<+i , . . . ,  rim) 

Ck 

-  X<'>(n)).  (3b) 

Hcte.  (n)  «  (nt,..>.,nm)  danotM  a  point  in  the  first 
iiyp«r-<|uandrant,  x^)(n)  ia  the  portion  of  the  state  vec¬ 
tor  propagating  in  the  direction  specified  by  the  axis  n,- , 
e(ii)  is  the  m-D  input  vector,  and  Af^  and  Bf ,  fort  = 
I . m,  j  s  1, . . . ,  m,  are  the  submatrices  of  the  sys¬ 

tem  and  input  matrices,  respectively. 

If  (1)  is  realised  in  fixed-point  arithmetic,  it  takes 
the  following  form  under  sero-input  conditions: 

rfi(»)lx<»)l(n)-| 


Equation  (4)  assumes  quantization  after  summa¬ 
tion;  since  practically  all  mo^m  DSP  machines  imple¬ 
ment  this  quantisation  scheme,  we  utilize  this.  The 
vector-valued  quantisation  nonlinearity  Q{-}  may  rep¬ 
resent  any  one  of  the  conventional  schemes,  viz.,  magni¬ 
tude  truncation,  magnitude  rounding,  two’s  complement 
truncation,  and  two’s  complement  rounding. 

Equation  (2)  can  be  implemented  in  two  different 
forms: 


f  9(‘)[x(*>](n)  1 
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)lx("*)l(n) 
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.x<’~)(n). 
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.5(m)[x(’")](n). 
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J 

Equation  (5)  corresponds  to  quantisation  after  multi¬ 
plication,  whereas  (6)  corresponds  to  quantization  after 
addition.  In  contrast  to  (1),  for  (2),  it  is  not  obvious 
which  of  the  two  forms  stat^  above  is  preferable. 

The  following  definition  for  asymptotic  stability  [S] 
will  be  used  throughout  this  paper. 

Definition.  An  m-0  first  hyper-quadrant  causal  discrete¬ 
time  system  is  asymptotically  stable  under  all  finitely 
extended  bounded  input  rignals  u(n)  where 

|u(n)|  <5,  for  ni  +  •  •  +  nm  <  D;  (7) 

u(n)  =  0,  for  ni  H - +  nm  >  D,  (8) 

if  all  the  states  of  the  m-D  discrete-time  system  asymp¬ 
totically  reach  zero  for  nj  -f  •  •  •  -|-  —  oo.  Here, 

nt>  >  Q,  V  =  1 . m,  S  is  a  nonnegative  real  number, 

and  D  is  a  positive  integer. 

Since  the  fixed-point  systems  considered  are  in  fact 
finite  state  machines,  the  condition 

x(‘)(n) 


xi’")(n) 

for  ni  +  •  •  •  -f-  nm  — *  oo,  n„  >  0,  i/  =  1, . . .  ,m,  can  be 
strengthened  to 

x(')(n) 


x(’")(n) 

for  all  points  n|  H - h  Om  >  c,  n„  >0,  u  =  I . m, 

where  c  is  some  finite  integer. 

Problem  Formulation.  Analyze  the  asymptotic  zero- 
convergence  of  the  state  response  of  systems  in  (4,5) 
and  (4,6)  under  the  assumption  that  the  underlying  lin¬ 
ear  system  is  asymptotically  stable. 

HI.  NECESSARY  CONDITIONS  FOR 
GLOBAL  ASYMPTOTIC  STABILITY 
OF  m-D  SYSTEMS 

In  this  section,  we  present  some  necessary  condi¬ 
tions  for  stability  of  a  first  hyper-quadrant  causal  m- 
D  discrete-time  system  represented  in  its  Roesser  local 
state-space  model  in  (1,2).  These  necessary  conditions 
are  formulated  in  terms  of  1-D  conditions.  This  theorem 
follows  directly  from  a  result  in  (6)  which  was  formulated 
for  q-operator  implemented  discrete-time  systems.  The 
proof  of  the  theorem  rests  on  the  fact  that  a  first  hyper- 
quadrant  m-D  system  can  be  described  by  a  1-D  system 
for  those  locations  that  are  along  the  m  coordinate  axes 
of  the  boundary  of  the  hyper-quadrant.  Reformulating 
the  result  in  [6]  for  6-operator  systems  produces  the  fol¬ 
lowing  theorem: 
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Tktermm  1. 

(«)  A  nwMMry  condition  for  globol  asymptotic  sto- 
biiity  of  tho  system  in  (4,5)  is  that  each  of  the  following 
1-D  qrstems  in  (9,10)  is  globally  asymptotically  stable: 

=  4  { [AtiW'Hni) }  ;  (9) 

«<‘>lx<*>](n.)  =  x(‘)(n.)  +  Q  {  A  5<‘>(x(‘)l(ni)}  (10) 
where  t  s  l,...,m. 

(b)  A  necessary  condition  for  global  asymptotic  sta^ 
btlity  ot  the  system  in  (4,6)  is  that  each  of  the  following 
in  I'D  systems  in  (11,12)  is  globally  asymptotically  sta¬ 
ble: 

«<*>lx<*>l(rM)  =  Q  {(Afax<*)(n0}  .  (11) 

«<'){x(*)Kni)  =  Q  {x(‘)(ni)  +  A  6<‘J[xC)](n.) }  (12) 

where  t  s  1, . . . ,  m. 

Proof.  For  a  detailed  proof,  and  generalizations  to  higher 
sub-dimensional  systems,  the  reader  is  referred  to  [6].  ■ 

Theorem  1  can  be  viewed  as  an  extension  of  the 
concept  of  practical  BIBO  stability  to  asymptotic  sta¬ 
bility  of  nonlinear  systenu.  It  is  particularly  useful  in 
proving  instability  in  m-D  nonlinear  systems. 

IV.  NECESSARY  CONDITIONS  FOR 
GLOBAL  ASYMPTOTIC  STABILITY 
OF  1-D  SYSTEMS 

Let  us  rewrite  (9),  (10),  and  (12)  as  1-0  matrix 
equations  of  order  K.  In  this  case,  (9),  (10),  and  (12) 
yield  (13),  (14),  and  (IS),  respectively: 


Now,  we  are  in  a  position  to  formulate  the  second  theo¬ 
rem  which  presents  a  necessary  condition  for  stability  of 
1-D  systems. 

Theorem  S.  A  necessary  condition  for  global  asymptotic 
stability  of  the  system  in  (13,14)  or  (13,15)  is  given  by 

A  >  0.5,  for  magnitude  rounding; 

A  >  1,  for  truncating. 

Proof.  For  global  asymptotic  stability  of  (13,14),  it  is 
necessary  that 


Q<  A 


6[ii)(n) 

*[*Kl(n)  J 


(16) 


for  any  |  )  ^  ^ 

.  \*/c(n)  / 

First,  we  will  address  the  case  of  magnitude  round¬ 
ing.  Obviously,  condition  (16)  is  violated  if,  for  x„  /  0, 

|A-6(i„](n)|  <  ^,  for  i/  =  l,...,A',  (17) 

where  i  is  the  quantization  step.  Expressing  the  sam¬ 
pling  time  A  as  an  integer  multiple  of  f,  we  have 

A  =  /f,  (18) 
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^1*k1(»»)  J 


;(13) 


;  (14) 


(15) 


where  /  is  some  (typically  small)  positive  integer. 
With  (17)  and  (18),  we  obtain  the  following  condition 
for  instability: 

|5{i,)(n)|  <  ^,  t'=  l,...,m,  (19) 

for  Zv  /  0,  »/  =  1, . . . ,  m. 

Condition  (19)  is  not  satisfied  for  any  nonzero  value 
of  Z(.  (that  is,  the  condition  for  instability  is  not  satisfied) 
if  f  >  1/2/,  or  equivalently, 

^>\  (20) 

This  proves  the  theorem  for  magnitude  rounding. 

For  the  case  of  magnitude  truncating,  (17)  takes 
the  form 


|A  •  8(zv](n)|  <  f,  for  (21) 


Therefore,  (19)  becomes 


|8(zv)(n)|  <  i. 


1 

i 


(22) 
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This  finally  ywlds 

A  >  1.  (23) 

For  two's  Gompleinent,  (17)  takes  the  form 

0  <  A  •  j[xt.](p)  <  t,  for  V  =  I, . . .  ,K.  (24) 

« 

This  results  in 

(25) 


0  S  %«I(")  <  y. 


and  oonsequently,  A  >  1.  This  proves  the  theorem  for 
ths  system  in  (13,14).  A  similar  argument  can  be  used 
for  the  system  in  (13,15)  by  considering  the  cases  for 
which 

^(n](n) 


[6(zKl(n) 

*l(n) 
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L*K(n)  J 


(26) 
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for  nonsero  state  vectors.  ■ 

We  can  now  combine  Theorems  1  and  2  to  formu¬ 
late  a  necessary  condition  for  stability  of  m-D  first  hyper¬ 
quadrant  causal  Operator  formulations  of  the  general¬ 
ised  Roesser  model. 


ConUary  S.  A  necessary  condition  for  global  asymptotic 
stability  of  the  m-D  systems  in  (4,5)  or  (4,6)  is 

A  >  0.5,  for  magnitude  rounding; 

A  >  1,  for  truncatiing. 

.  Proof.  The  proof  follows  from  Theorems  1  and  2.  ■ 

Comments. 

1.  Theorem  2  and  Corollary  3  are  also  essentially  ap¬ 
plicable  to  the  case  where  the  sampling  time  varies 
with  the  direction  of  propagation.  In  this  case,  the 
inequalities  in  Theorem  2  and  Corollary  3  would 
have  to  be  replaced  by 

Ai  >  0.5,  for  magnitude  rounding; 

Ai  >  1,  for  truncating, 
for  t  =  1, .. .  ,m. 

2.  Most  of  the  previous  results  on  the  superior  fi¬ 
nite  wordlength  properties  of  ^-operators  depend 
on  choosing  a  very  small  sampling  time  A.  In  such 
a  case.  Theorem  2  and  Corollary  3  show  that  the 
system  response  will  not  converge  to  zero  for  the 
unforced  case. 

3.  Our  analysis  is  limited  to  the  zero-input  case  for 
which  DC  limit  cycles  were  used  to  derive  condi¬ 
tions  for  non-convergence.  If  one  includes  other 
types  of  limit  cycles  in  the  analysis,  the  require¬ 
ments  for  A  may  become  even  more  severe. 

4.  Theorem  2  and  Corollary  3  show  that  fixed-point 
implementations  of  1-D  and  m-D  6-operator  sys¬ 
tems  cannot  be  realized  limit  cycle  free,  if  good  coef¬ 
ficient  senstttvtty  and  guantixalion  noise  measures 
have  to  be  achieved.  Sm  also  [7]. 


V.  CONCLUSION 

In  this  paper,  it  was  shown  that  fixed-point  imple¬ 
mentations  of  1-0  and  m-O  6-operalor  systems  are  not 
limit  cycle  free  even  if  the  underlying  linear  system  is 
stable  and  the  sampling  time  is  chosen  small.  This  non- 
convergent  behavior  can  be  explained  by  the  quantiza¬ 
tion  of  the  6-term  to  zero  which  leaves  the  state  vector 
unchanged.  The  smaller  the  sampling  time,  the  more 
severe  this  effect  is.  Therefore,  the  practical  value  of 
6-operators  for  fixed-point  implementations  of  1-D  and 
m-D  systems  is  questionable.  There  are  however  indica¬ 
tions  that  this  effect  is  much  less  severe  in  fioating-p>oint 
implementations. 

6-operator  implemented  discrete-lime  systems  rep¬ 
resent  a  class  of  systems  where  the  quantization  noise 
at  the  output  can  be  small  compared  to  other  realiza¬ 
tions.  However,  as  was  shown  above,  such  realizations 
will  invariably  exhibit  limit  cycle,  that  is,  highly  cor¬ 
related  quantization  noise,  behavior.  Therefore,  in  this 
case,  typical  measures  for  quantization  noise  are  of  very 
limited  use  for  obtaining  any  insiglit  into  the  likelihood 
of  limit  cycles  and  vice  versa. 
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Abatract — By  developing  the  5-operator  ana¬ 
log  of  the  Reiser  model,  state-space  realiza¬ 
tion  of  two-  and  multi-dimensional  5-systems 
is  investigated.  The  corresponding  notions 
of  gramians  and  balanced  realization  are  also 
defined.  It  is  shown  that,  discrete-time  sys¬ 
tem  implementation  using  this  model  can 
yield  superior  coefficient  sensitivity  proper¬ 
ties. 

I.  Introduction 

Judging  by  its  performance  in  the  one-dimensional  (1- 
D)  case  [2],  [5-6],  one  is  led  to  exi}ect  superior  coeffi¬ 
cient  sensitivity  and  roundoff  noise  performance  with  5- 
operator  implementation  of  two-dimensional  (2-D)  and 
multi-dimensional  (m-D)  discrete-time  (DT)  systems. 
With  this  in  mind,  5-operator  analog  of  the  9-operator 
Roesser  local  state-space  (s.s.)  model  [12]  is  developed. 
We  propose  the  notions  of  gramians  and  balanced 
(BL)  realisation.  As  expected,  realization  using  this 
model  can  provide  superior  coefficient  sensitivity  prop¬ 
erties. 

II.  Nomenclature  and  Preliminaries 
A.  Nomenclature 

R:  Reals;  Complex  numbers;  Matri¬ 
ces  of  size  q  X  p  over  R  and  n  x  n  unit  ma¬ 

trix;  A*,  trace[A],  ||A||jr:  Conjugate  transpose,  trace, 
and  Flrobenius  norm  of  matrix  A;  Unit  vector  in 

S*  with  1  on  th.  i-th  row;  €  S”"; 

=  E?.,  E;.. 

For  q-  and  5-systems,  we  use  the  indeterminates  z  and 
c,  re^)ectively.  For  1-D  systems,  5  =  (9  -  l)/r  <=^  c  = 
(z  —  l)/r,  where  r  is  a  positive  real  constant,  usually 

the  sampling  time.  Let  Uf  —  {(ch,Cv)  €  :  |ch  -1- 

l/ntl  <  l/r*,|c»  -I-  l/r^l  <  1}.  is  its  boundary. 
The  corresponding  9-system  regions  are  denoted  with  the 
subscript  9. 

K.P.  sad  P.H.B.  gratefully  acknowledge  the  eupport  received 
from  the  Office  of  Naval  Research  (ONR  )  through  the  grants 
N00014.94-1-04S4  and  N00014-94-1-03S7,  respectively. 


B.  Preliminaries 

Consider  a  linear,  shift-invariant,  strictly  causal,  p-input 
9-outrut  2-D  DT  system.  Its  n/^h-rt«v  Roesser  local  s.s. 
model  {A,B,C,b}  takes  the  form  [12]: 

where  u  €  R**,  x*  6  R"'‘,  x*  €  and  y  €  x* 
and  x"  are  the  h.p.  and  v.p.  local  state  vectors.  Take 
n  =  -H  r»» .  Also, 

9h(x](t,i)  =  x(t  l,i);  9v[xl(t,i)  =  x(t,i  +  1).  (2.2) 

In  what  follows,  we  use  matrix  partitioning  that  con¬ 
form  to  A  i  B  i  and  C  = 

<7(2)  j  'Tjjg  corresponding  2-D  characteristic  equa¬ 
tion  and  transfer  function  are 

det[/,  -  A]  =  det[zh/„^  0  Zu/n„  -  A]; 
H{zH,z,)=C{h  -  A)-^B  +  b, 

where  Zfc,  z,  6  Si  /*  =  zfc/n;,  0  Zvln„  €  9’**’*.  With  no 
nonessential  singularities  of  the  second  kind  (NSSK)  on 
r/,  {A,  S,  C,  £>}  is  BIBO  stable  iff  [3] 

det(/,  -  A]  5^  0,  V(zfc ,  z„)  6  ivj .  (2.4) 

III.  2-D  5-Model 
A.  Local  s.s.  model 

Analogous  to  the  1-D  case,  define  5h[-]  and  5v[  ]  as 

c  r  W  _  x(»  + l,jf)  -  x(t,i)  _  9h[x](«,j)  -  x(«,  j) 

ohlxKt.j)  - - - - ; 

Th  r|^ 

C  t..,/.  _  x(i,i-|-l) -x(i,i)  _  9v[x](»,i) -x(i,i) 

o»lXjI»,Jl  — - - - - ■ 


(3.1) 


R««  and  r«  ate  positive  real  constants  denoting  the 
’sampling  times'  along  h.p.  and  v.p.  directions,  respec- 
tiv^y.  Note  that 

=  l  +  =  (3.2) 


where  f(u)  = 


0 


Let  le  =  Cfc/n*  0c„/n„  €  O"*’*.  Then,  the  2-D  S- 
model’s  characteristic  equation  and  transfer  function  are 


and  letting  r  =  (ra/„j|,  ®  €  R"*'", 

».i) 

».i) 

(3.3) 

Using  (3.3)  in  (2.1),  we  get 


det[/c  -  A]  =  T-^det[/,  -  A]|,_c: 
det[rj 

R(Cfc,C«)  =  Zv)\t—>c, 


where 


Zfc  =  l  +  rhC/,;  z„  =  l  +  rvCv. 


(3.9) 


(3.10) 


y(iJ)  =  [C] 


rx*(‘.i) 

Lx-(.-,i) 

rx*(.,i) 

Ix^i,^) 


+  [B]u(«,j); 
+  [D]u(i,j), 


(3.4) 


where  A  ^  B  ^ 

wftere  A  -  j ,  a  -  |^3(2)J. 

JC<*),C^*)j.  In  addition,  we  need  to  perform 


and  C  = 


FVom  now  on,  the  variable  transformation  in  (3.10)  is 
denoted  by  c  — *  z  or  z  — •  c  whatever  is  appropriate. 
Nonsingular  transformations  of  the  type 


=  [T] 


(3.11) 


where  T  =  0  yield  the  equivalent  2-D  s.s. 

realization  {A,  B,C,  D}.  Here, 


gfc(x^]  =  x'^+T,,-6k[x%  =  x’’-Hr„-6,[x''].  (3.5) 


A  =  TAT-^,  B  =  TB,  C  =zCT-^,  D  =  D.  (3.12) 


Here, 


With  no  NSSK  on  T^- ,  {A,  B,C,  D)  is  BIBO  stable  iff 


A  =  /„  +  rA;  B  =  tB\  C  =  C;  b  =  D.  (3.6) 


det[/c-A]/0,  V(cfc.cO€Wr  (3.13) 


B.  Properties  of  the  2~D  6-model 

Most  of  the  following  properties  may  be  derived  in  a  man¬ 
ner  that  is  exactly  analogous  to  that  in  [12]. 

The  transition  matrix  A*'^  of  the  j-model,  may  be 
recursively  computed  from 


f  O,(i,i)=(0,0): 

(^«k®/n.],(i,j)  =  (0,0): 


A’d  =  f 


f/n*  O' 
0  0 

+  r 

ACD 

0 

a(2)- 

0 

0  0 

+  T 

0 

0 

.  0  /n„ 

A(3) 

A(*) 

A^’^A'-^’i  +A°’^A*’i 

-Se 

.(i.j)=(1.0); 
,(i,i)  =  (0,l); 


The  general  response  of  the  j-model  is 


(3.7) 


C.  Gramians 

The  gramians  of  2-D  g-systems  are  taken  to  be  natural 
extensions  of  the  integral  expressions  of  their  1-D  coun¬ 
terparts  [11].  We  will  adopt  a  similar  approach.  In  what 
follows,  we  consider  the  1-D  (or  2-D)  stable  j-system 
{A,  fl,C,  D}  with  gramians  P  and  Q.  The  correspond¬ 
ing  g-system  is  {A,  B,C,  D}  with  gramians  P  and  Q. 

1-D  case.  The  gramians  are  defined  in  [10]. 

Definition  3.1.  [10].  The  gramians  are  the  solutions  to 
the  Lyapunov  equations 

AP  -I-  FA*  +  T  ■  APA*  =  -BB*; 

A*Q  -1-  QA  -1-  r  •  A’QA  =  -C*C. 

Lemma  3.1.  The  gramians  satisfy  the  integral  expres¬ 
sions 


[ 


x*(t,i) 

x*(i.i) 


E''-“  [.-(M)] 


dc  Q  _  1 

I  +  rc'  2irj 


G*G 


dc 

1  -(-  rc’ 


(3.8) 


where  F(c)  =  (c/n  —  A)  'B  and  G(c)  =  C(c/n  —  A)“*. 
Moreover,  P  =  rP  and  Q  =  Q/r. 


Proof.  Sufaatitute  A  ~  In  ■¥  rA,  B  =  tB,  C  =  C,  and 
i)  ss  D  [10]  in  the  equations  in  Definition  3.1,  and  note 
the  inte^«l  expressions  for  P  and  Q  in  [8].  ■ 

3-0  case.  With  Lemma  3.1  in  mind,  we  have 
DafinUion  3.2.  The  gramians  are 


p  ^ 

1 

I  dc„ 

(2iri)»  , 

7r  J  1  +  ’■fcCk  1  +  ' 

$ 

Q  — 

1 

li  c*c 

W  — 

(2iri)5  . 

J.j.i  1 -h  TfcCfc  1 -h  r,c« 

where  P  = 

rp(i) 

[p(3) 

s;!] <5  ^  [%i\ 

g(2) 

<3(4) 

Also,  F(ca,c,)  =  (/e  -  A)-1B  =  (ft,...,fn]*  and 
<3(ca,c,)  =  C(/e  -  A)-^  =  (gi,  ...,gn]- 


Remarks. 

1.  Note  that,  (Ic  —  A)~^|e_.«  =  (Ix  -  A)“^r,  and 

F\c^,  =  F;  G\c^,  =  G  ■  r.  (3.14) 


when  {A,  B,C,  D}  is  locally  reachable  and  observable, 
P^*\  and  are  each  p.d. 

Separable  systems.  A  separable  (in  denominator)  2-D 
9-system  will  have  A^^^  =  0  (and/or  A^^^  =  0)  and  all 
off-diagonal  submatrices  of  P  and  Q  are  zero.  The  di¬ 
agonal  submatrices  may  be  computed  through  two  pairs 
of  Lyapunov  equations  [11].  Clearly,  a  separable  2-D  q- 
system  yields  a  separable  2-D  6-system. 

Theorem  3.6.  Let  {A,  B,  C,  D)  be  separable  with  A^^^  = 
0.  Then,  _  q  and  =  0,  and 

=  -[c<i)  B(<)a(3)]*  [C(i) 

+  p(4)^(4)*  +r^A^*'>P^*'>A<'*^’ 

=  -[b(2)  [b(2>  A(3)s(i)]*/rA: 

=  _C<2)*C(2)/r^. 


2.  Definition  3.2  is  completely  analogous  to  the  1-D  and 
2-D  9-systems  [7],  [11]. 

Lemma  3.2.  P  =  t/xT^P  and  Q  =  rfcr„r”‘Qr”^. 

Proof.  Consider  P  in  Definition  3.2.  Use  c  -►  z,  (3.14), 
and  definition  of  gramians  for  2-D  9-systems  [11].  ■ 

The  following  are  in  complete  analogy  with  2-D  q- 
systems. 

Lemma  3.3.  The  gramians  may  be  represented  as 


issO  i=0 
oo  oo 

Q  =  —r  ■  V  y  A‘-’*  C*CA‘>>  •  r, 

TkT-  a-mi  £-^ 


t=0  jsO 


where,  for  (i,  j)  =  (0, 0),  Af.  j  =  0,  and,  for  (»,  j)  >  (0, 0), 

Afo-  =  A*-W^  [^p’]  +A*>>-»r 

Lemma  3.4.  Let  {A,  B,  C,  £>}  with  gramians  P  and  Q  be 
an  equivalent  system  as  in  (3.10-11).  Then,  P  =  TPT* 
and  <3  =  QT~^ .  Moreover,  the  eigenvalues  of  PQ 
and  PQ  are  invariant. 

Definition  3.3.  {A,  B,C,  D}  is  said  to  be  balanced  if 

p(i)  =:  Q(i)  =  S(i)  =  diag{<r[‘\«r('\...,«rlV} 

pi*)  =  Qi*)  =  E«)  =  diag{<r("\a("\ . . 

If  the  diagonal  submatrices  of  P  and  Q  are  each  posi¬ 
tive  definite  (p.d.),  a  BL  realization  may  be  obtained  [4]. 
Regarding  this,  we  have 

Lemma  3.5.  Local  reachability  and  observability  of 
{A,  B,  C,  D}  and  [A,  B,  C,  O)  are  equivalent.  Moreover, 


Here,  Ri*)"  Ri*)  =  t/xTvQ^*)  and  =  Thr^pC^). 

IV.  Coefficient  Sensitivity 

By  generalizing  a  certain  sensitivity  measure,  Lutz  and 
Hakimi  [9]  have  addressed  sensitivity  minimization  of 
MIMO  1-D  CT  systems.  The  SISO  2-D  9-operator  case 
is  in  [7].  In  what  follows,  we  study  the  coefficient  sen¬ 
sitivity  of  the  2-D  6-model  in  section  III.  We  follow  a 
more  direct  approach  using  Kronecker  product  formula¬ 
tion  and,  hence,  the  results  are  applicable  to  the  more 
general  MIMO  case.  Using  [1],  we  may  show 

BA(^h>Cv  )  =  [In  ^  '  iinxn  '  [In  <S>  F]  (4.1) 

Bb(c/x,Cv)  =  [In  <8>  G]  ■  UnKp  (4.2) 

5c(c;>,Cv)  =  Uqnn  ■  [fn  ®  F\  (4-3) 

Soi^hi^v)  —  ilqxp  (4.4) 

Lemma  4.I.  The  quantities  in  (4. 1-4.4)  are  given  as 


Sm  = 

Sg  = 

Sc  = 


[fi*  •••  fn); 


gn 


gl^^  •• 

gn  '  •  ■ 

■  gi^V 

r;-- 

..  fU) 

*n 

*1 

f(9) 

"Ei,i 

••  £i.p' 

.Es.i 

Ei,p . 

Here,  denotes  a  (9  x  p)  null  matrix  except  its  j-th 
row  which  is  f*  and  denotes  a  (9  x  p)  null  matrix 
except  its  i-th  column  which  is  g,-. 

Proof.  This  may  be  shown  through  the  results  in  [1]  and 
simple  yet  tedious  algebraic  manipulations.  ■ 

Coro/iMy  ^.£.  The  quantities  5^ ,  Sg,  Sc,  and  Sg  of  the 
^model  and  the  quantities  S^,  S^,  5^,  and  5^  of  the 
correspomhng  q-model  are  related  by  5x|c->s  = 

Sg jc-^a  —  5c|e— —  S^,  and  Sol«— —  ^q, 

where  T=  ©  Tw/n,,  € 

Proof.  Apply  (3.14)  to  Lemma  4.1.  ■ 

To  proceed  further,  we  utilize  the  following 
Definition  4.1.  Let  be  a  bivariate  matrix* 

valued  function  that  is  analytic  on  Then, 

l|jy(cA.c,)||5  =  <f  ||H(c*,c«)U^.||J. 

Remark.  This  norm  is  extensively  utilized  in  related  work 
[7]  due  mainly  to  the  fact  that  it  leads  to  tractable  re¬ 
sults.  This,  and  our  desire  to  make  a  comparison  with 
the  corresponding  9-modeI,  are  the  primary  reasons  for 
its  use  here. 

We  now  define  the  absolute  sensitivity  measure 

M  i  ||5x||?  +  i||5a|i»  +  -ll^cilz’  +  -\\Sd\\1  (4-5) 
P  9  P7 

Remarks. 

1.  The  use  of  different  norms  is  for  mathematical  feasi¬ 
bility  and  tractability  [7],  [5]. 

2.  The  weights  associated  with  each  term  in  (4.5)  may 
be  thought  of  as  averaging  factors  per  input/ output. 

3.  Due  to  (3.5),  M  should  contain  ||5rj^||  and  ||5r«||. 
However,  we  assume  that  and  r,  are  selected  such 
that  eadi  possess  exact  binary  representations.  Hence, 
these  additional  terms  ate  neglected. 

Using  an  argument  similar  to  that  in  [7],  one  may  show 
the  following: 


IISaIIi  <  trace[F]  •  tracelrQr] 

(4.6) 

II^bII*  =  P  ■  trace(rQTj 

(4.7) 

nielli  =  9  •  trace(P] 

(4.8) 

II^dIIs  =P<t 

(4.9) 

Combining  (4.5)  with  (4.6-9),  we  get 

M  <  M  ^  (trace(P]  +  l)(trace{r(jr]  1). 

(4.10) 

It  is  customary  to  perform  a  minimization  of  M.  Hence, 
one  attempts  to  characterize  those  {A,  B,C ,  D]  that  are 
‘bound  optimal’  with  respect  to  M.  Analogous  to  2-D 
9-systems  case  [7],  one  may  for  instance  show  that  a  BL 
realization  (modulo  an  orthogonal  nonsingular  transfor¬ 
mation)  is  ‘bound  optimal’  with  respect  to  M . 

Compared  to  a  9-system,  its  6-system  counterpart 
yields  a  smaller  M  whenever  trace[Q]  >  tracefrQr],  that 

is. 

(1  -  tI)  ■  trace((5^^^]  +  (1  -  r^)  ■  trace[Q(*)]  >  0.  (4.11) 

Note  that,  with  the  local  reachability  and  observability 
assumption  of  {A,  B,C,  D},  p.d.  of  and  (and 
hence  of  and  Q^*^)  are  guaranteed.  Thus,  (4.11)  is 
satisfied  if  <  1  and  n.  <  1. 

VII.  Conclusion 

We  have  developed  the  6-operator  analog  of  the 
Roesser  local  s.s.  model.  Notions  of  gramians  and  BL 
realization  are  also  proposed.  As  is  expected,  under  mild 
conditions,  this  model  offers  superior  coefRcient  sensitiv¬ 
ity  properties. 
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